“ Enhanced Recognition of Transmembrane Protein Domains with Prediction - based Structural Profiles ”

نویسندگان

  • Baoqiang Cao
  • Aleksey Porollo
  • Rafal Adamczak
  • Mark Jarrell
  • Jaroslaw Meller
چکیده

In addition to a simple NN-based classifier developed and assessed in the cross-validation study (see the Systems and Methods section of the main body of the paper), we also developed a multistage protocol for enhanced prediction of transmembrane (TM) helices. For the final predictor we do not consider the MA-based representation, which is shown using cross-validation to yield a lower accuracy compared with the proposed compact prediction-based " structural profiles ". We also excluded hydropathy scales from the representation of an amino acid residue, as it was observed to lead to a higher level of confusion with globular proteins and signal peptides. Moreover, statistical propensities of amino acids (e.g. to lipid – aqueous phase interfaces) that could be used, in principle, to improve the prediction of TM segments, are not taken into account at this stage. Consequently, each residue is initially represented by five numbers: the predicted real valued RSA, confidence of RSA prediction and probabilities of each of the three secondary structures (as predicted by SABLE, http://sable.cchmc.org). SABLE predictions are derived from the multiple alignment, hydropathy scales and other attributes, which are commonly used by other state-of-the-art methods. Therefore, it is expected that other accurate methods for RSA and SS prediction will be useful in that regard as well. This is a subject of a future investigation. Following in the footsteps of other studies (Rost et al., 1995), we use a two-stage prediction system, with the second layer (structure-to-structure) NNs allowing one to " average " and smooth over the initial classification obtained using the first (sequence-to-structure) layer predictor. The architecture of the first and second layer NNs is similar to that used for the cross-validation study. Namely, a simple feed-forward topology with one hidden layer, fully interconnected with the input and output layers, is employed. The choice of the sliding window size, the number of nodes in the hidden layer, training protocols and other characteristics of these NNs are discussed below. In order to reduce the danger of overfitting and achieve regularization as well as improvement in accuracy, a consensus of twenty different networks was used to generate predictions at each stage. These different networks were trained on different subsets of the training set that were also used for the cross-validation study. Multiple NNs were trained on each subset of the data, with different number of nodes in the hidden layer and with different size of the sliding …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Molecular Insight into the Mutual Interactions of Two Transmembrane Domains of Human Glycine Receptor (TM23-GlyR), with the Lipid Bilayers

Appearing as a computational microscope, MD simulation can ‘zoom in’ to atomic resolution to assess detailed interactions of a membrane protein with its surrounding lipids, which play important roles in the stability and function of such proteins. This study has employed the molecular dynamics (MD) simulations, to determine the effect of added DMPC or DMTAP molecules on the structure of D...

متن کامل

Enhanced recognition of protein transmembrane domains with prediction-based structural profiles

MOTIVATION Membrane domain prediction has recently been re-evaluated by several groups, suggesting that the accuracy of existing methods is still rather limited. In this work, we revisit this problem and propose novel methods for prediction of alpha-helical as well as beta-sheet transmembrane (TM) domains. The new approach is based on a compact representation of an amino acid residue and its en...

متن کامل

PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

BACKGROUND Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they ofte...

متن کامل

A knowledge-based scale for the analysis and prediction of buried and exposed faces of transmembrane domain proteins

MOTIVATION The dearth of structural data on alpha-helical membrane proteins (MPs) has hampered thus far the development of reliable knowledge-based potentials that can be used for automatic prediction of transmembrane (TM) protein structure. While algorithms for identifying TM segments are available, modeling of the TM domains of alpha-helical MPs involves assembling the segments into a bundle....

متن کامل

The use of functional domains to improve transmembrane protein topology prediction

Transmembrane proteins affect vital cellular functions and pathogenesis, and are a focus of drug design. It is difficult to obtain diffraction quality crystals to study transmembrane protein structure. Computational tools for transmembrane protein topology prediction fill in the gap between the abundance of transmembrane proteins and the scarcity of known membrane protein structures. Their pred...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006